Finite-sample and asymptotic analysis of generalization ability with an application to penalized regression

نویسندگان

  • Ning Xu
  • Jian Hong
  • Timothy C. G. Fisher
چکیده

In this paper, we study the generalization ability (GA)—the ability of a model to predict outcomes in new samples from the same population—of the extremum estimators. By adapting the classical concentration inequalities, we propose upper bounds for the empirical out-ofsample prediction error for extremum estimators, which is a function of the in-sample error, the severity of heavy tails, the sample size of in-sample data and model complexity. The error bounds not only serve to measure GA, but also to illustrate the trade-off between insample and out-of-sample fit, which is connected to the traditional bias-variance trade-off. Moreover, the bounds also reveal that the hyperparameter K, the number of folds in K-fold cross-validation, cause the bias-variance trade-off for cross-validation error, which offers a route to hyperparameter optimization in terms of GA. As a direct application of GA analysis, we implement the new upper bounds in penalized regression estimates for both n≥ p and n < p cases. We show that the L2 norm difference between penalized and un-penalized regression estimates can be directly explained by the GA of the regression estimates and the GA of empirical moment conditions. Finally, we show that all the penalized regression estimates are L2 consistent based on GA analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Use of Two Smoothing Parameters in Penalized Spline Estimator for Bi-variate Predictor Non-parametric Regression Model

Penalized spline criteria involve the function of goodness of fit and penalty, which in the penalty function contains smoothing parameters. It serves to control the smoothness of the curve that works simultaneously with point knots and spline degree. The regression function with two predictors in the non-parametric model will have two different non-parametric regression functions. Therefore, we...

متن کامل

Penalized Quantile Regression Estimation for a Model with Endogenous Individual Effects

Abstract. This paper proposes a penalized quantile regression estimator for panel data that explicitly considers individual heterogeneity associated with the covariates. We provide conditions under which the estimator is asymptotically Gaussian, and the harshness of the penalization can be determined by minimizing asymptotic mean squared error. We investigate finite sample and asymptotic perfor...

متن کامل

Penalized Bregman Divergence Estimation via Coordinate Descent

Variable selection via penalized estimation is appealing for dimension reduction. For penalized linear regression, Efron, et al. (2004) introduced the LARS algorithm. Recently, the coordinate descent (CD) algorithm was developed by Friedman, et al. (2007) for penalized linear regression and penalized logistic regression and was shown to gain computational superiority. This paper explores...

متن کامل

Asymptotic algorithm for computing the sample variance of interval data

The problem of the sample variance computation for epistemic inter-val-valued data is, in general, NP-hard. Therefore, known efficient algorithms for computing variance require strong restrictions on admissible intervals like the no-subset property or heavy limitations on the number of possible intersections between intervals. A new asymptotic algorithm for computing the upper bound of the samp...

متن کامل

Comparison of Ordinal Response Modeling Methods like Decision Trees, Ordinal Forest and L1 Penalized Continuation Ratio Regression in High Dimensional Data

Background: Response variables in most medical and health-related research have an ordinal nature. Conventional modeling methods assume predictor variables to be independent, and consider a large number of samples (n) compared to the number of covariates (p). Therefore, it is not possible to use conventional models for high dimensional genetic data in which p > n. The present study compared th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1609.03344  شماره 

صفحات  -

تاریخ انتشار 2016